3574 results found.
Written
Evaluation Data,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CC BY 4.0
Size:
155 MByte Production Status:
Newly created-finished
Use:
Knowledge Discovery/Representation
-
Paper title:Evaluation Dataset and Methodology for Extracting Application-Specific Taxonomies from the Wikipedia Knowledge Graph
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Georgeta Bordea | Evaluation Benchmark for Domain Taxonomies from Knowledge Graphs (EBDT-KG) | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons
Size:
4000000 tokens Production Status:
Newly created-in progress
Use:
Parsing and Tagging
-
Paper title:GUMBY – A Free, Balanced, and Rich English Web Corpus
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Luke Gessler | AMALGUM | /N |
Documentation:
None yet
Written
Corpus,
Language Type:
Bilingual
Languages:
English German
Availability:
Freely Available
License:
Creative Commons
Size:
1627 Summaries OtherProduction Status:
Newly created-finished
Use:
Summarisation
-
Paper title:Summarization Beyond News: The Automatically Acquired Fandom Corpora
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Benjamin Hättasch | FandomCorpora | /N |
Documentation:
Readme on page/in the repository
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CreativeCommons Attribution-ShareAlike 4.0 International Public License
Size:
10 GByte Production Status:
Existing-used
Use:
Meta-data analysis & gender exploration
-
Paper title:Gender Representation in Open Source Speech Resources
-
Paper track:Speech/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mahault Garnerin | Crowdsourced high-quality UK and Ireland English Dialect speech data set | /N |
Documentation:
One html file in English
Speech
Corpus,
Language Type:
Bilingual
Languages:
Czech English
Availability:
License:
Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0 US)
Size:
7.3 GByte Production Status:
Existing-used
Use:
Meta-data analysis & gender exploration
-
Paper title:Gender Representation in Open Source Speech Resources
-
Paper track:Speech/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mahault Garnerin | Vystadial | /N |
Documentation:
A proceeding paper in English
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
CC BY 4.0
Size:
64 GByte Production Status:
Existing-used
Use:
Meta-data analysis & gender exploration
-
Paper title:Gender Representation in Open Source Speech Resources
-
Paper track:Speech/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mahault Garnerin | Librispeech | /N |
Documentation:
A proceeding paper in English
Speech
Corpus,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
Creative Commons BY-NC-ND 3.0
Size:
63.4 GByte Production Status:
Existing-used
Use:
Meta-data analysis & gender exploration
-
Paper title:Gender Representation in Open Source Speech Resources
-
Paper track:Speech/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Mahault Garnerin | TED-LIUM Release 3 | /N |
Documentation:
A proceeding paper (in English)
Written
Corpus Tool,
Language Type:
Monolingual
Languages:
English
Availability:
Freely Available
License:
BSD-3-Clause
Size:
5.65 MByte Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues
-
Paper track:Multimodality/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Francisco Javier Chiyah Garcia | CRWIZ | /N |
Documentation:
None
Written
Language Resources/Technologies Infrastructure,
Language Type:
Bilingual
Languages:
Arabic English
Availability:
Freely Available
License:
OpenSource
Size:
10266304 tokens Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:Constructing a Bilingual Hadith Corpus Using a Segmentation Tool
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Shatha Altammami | LK Hadith Corpus | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic English Mandarin Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
2.9 million words Production Status:
Existing-used
Use:
-
Paper title:Fine-grained Named Entity Annotations for German Biographic Interviews
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Josef Ruppenhofer | OntoNotes Release 5.0 | /N |
Documentation:
None




